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Abstract. The authors study statistical linear inverse problems in Hilbert spaces. Approximate 
solutions are sought within a class of linear one-parameter regularization schemes, and the parameter 
choice is crucial to control the root mean squared error. Here a variant of the Raus-Gfrerer rule is 
analyzed, and it is shown that this parameter choice gives rise to error bounds in terms of oracle 
inequalities, which in turn provide order optimal error bounds (up to logarithmic factors). These 
bounds can only be established for solutions which obey a certain self-similarity structure. The proof 
of the main result relies on some auxiliary error analysis for linear inverse problems under general 
noise assumptions, and this may be interesting in its own. 
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1. Introduction. In this study we introduce a new parameter choice strategy for 
statistical linear inverse problems in Hilbert spaces. We consider the following linear 
equation 

y s = 2V + 5£, (1.1) 

where T : X — > Y is a compact linear operator between Hilbert spaces X and Y, 
the parameter 5 > denotes the noise level, and £ stands for the additive noise, to 
be specified later as Gaussian white noise, which leads to observations y s . This is a 
standard model considered in statistical inverse problems. By using the singular system 



{sj, Uj, vj} of T to write Tx = Sj {x,Uj)Vj, x £ X, the above model ( 1.1 ) is seen to 
be equivalent to the sequence space model 

jti = Xj+6Sj, j = 1,2,..., 

with observations = (y s , Vj) / Sj , the noise £j is centered Gaussian with variance 5 2 /s^. 
The unknown solution x has coefficients Xj with respect to the basis Uj, j — 1,2,... 
This model is frequently analyzed, and we mention the recent survey [3]. In particular 
the minimax error is clearly understood if the solution sequence Xj, j = 1,2,... be- 
longs to some Sobolev type ball. In particular, a series estimator Xk(y s ) — Y^j=x c jVj 
(with appropriately chosen weights Cj) is (almost) optimal. 

The important question is how to choose the truncation level (parameter, model) fc 
based on the given data and the noise level 5. Parameter choice in statistical inverse 
problems, called model selection in this field, is an important issue, and we refer to [3] 
for a survey on this. Only recently, the discrepancy principle, which is the most promi- 
nent parameter choice in classical regularization theory, has been analyzed within the 
statistical context in [2], Here, for any estimator x — x(y s ) it requires to achieve 
that \\Tx - y 5 \\ X S. Since the white noise £ is not an element in Y, the discrepancy 
IjTx — y s \\ is not well-defined. Therefore, for statistical inverse problems, the traditional 
discrepancy principle can not be applied directly. 
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In order to make the discrepancy principle applicable to statistical inverse prob- 
lems, we may consider, instead, the symmetrized equation with A := T*T > and 
C := T*£, as 

z« = T* y s = Ax* + 6T*£ = Ax i + 5£. (1.2) 

Then, if the operator A has finite trace, the new misfit \\Ax — z s \\ is almost surely 
finite, and it is tempting to require that 

\\Ax{z s )-z s \\~5, (1.3) 

which gives the discrepancy principle for the symmetrized equation. However, as was 
pointed out in [2], this plain use of the discrepancy principle leads only to suboptimal 
performance. Instead, the misfit Ax(z s ) — z s should be weighted, and if done accord- 
ingly, this can yield optimal rates of reconstruction. To be specific we consider the 
family of reconstructions 



x 
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(al + A) 1 T*y s , a>0, 



via Tikhonov regularization. The authors in [2 studied the modified discrepancy prin- 
ciple 

\\(\I + A)- 1/2 (Ax 5 Q -z s )\\^S. (1.4) 

It is shown that an appropriate choice of A > yields order optimal reconstruction 
in many cases. However, the choice of A requires the unknown smoothness of solution 
which makes the discrepancy principle into an a priori rule. 

Instead, the authors in [T^] considered the varying discrepancy principle 

\\(aI + A)- 1/2 (Ax 5 Q -z s )\\-6 (1.5) 



by relating A = a in (1.4) to make the principle into an a posteriori one, and thus the 
weight depends on the parameter a under consideration. The main achievement in |12) 
is that this new principle may yield optimal order reconstruction (up to a logarithmic 
factor). However, it became transparent that such result holds only for solutions x* 
which satisfy certain self-similarity properties. This has an intuitive explanation: For 
large values of a, and this is where the discrepancy principle starts with, the misfit 
is dominated by the large singular numbers Sj. However, the approximation order is 
determined by all of the spectrum. 

The varying discrepancy principle has another drawback. The regularization scheme, 
which is used to determine the candidate solutions x s a must have higher qualification 
than given by the underlying smoothness in terms of general source conditions. For 
instance, if we use Tikhonov regularization, whose qualification is known to be 1, see 
[4], then the varying discrepancy principle gives order optimal reconstruction only for 
smoothness 'up to 1/2'. This effect, which is inherent in the discrepancy principle in 
classical regularization context, is called early saturation, and it can be overcome by 
turning from the discrepancy principle to the so-called Raus-Gfrerer rule (RG-rule). 



As Raus and Gfrercr proposed, instead of the discrepancy from (1.3) an additional 
weight should be used, which results in the RG-rule 

Hal + Ar'iAxi-z^W^S. (1.6) 
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This is the starting point for the present study, the application of the RG-rule within 
the statistical context. It will be shown that an appropriate use of the RG-rule will yield 
order optimal results without the effect of early saturation. Actually, we will propose a 
statistical version of RG rule and establish some oracle inequalities, provided that the 
solution obeys some self-similarity. Oracle inequalities are widely used in statistics, see 
[3] . An oracle inequality guarantees that the estimator has a risk of the same order as 
that of the oracle. The oracle bound in particular implies that Tikhonov regularization 
can achieve order optimal reconstruction up to order 1. 

This paper is organized as follows. We first precisely introduce the context, and 
then we state the main result with some discussion in Section [2j The proof of the 
main result will rely on preliminary results within the classical (deterministic noise) 
setting given in Section [3J however, under general noise assumptions. The results in 
this context may be interesting in their own. Finally, the proof of the main result is 
given in Section [4j 

2. Setup and main result. We shall use the same setup as in [21 HI]- However, 
the parameter choice will be different. 

2.1. Assumptions. We start with the description of the noise. We will mimic 
the notion of Gaussian white noise to the present case. Let (CI, T, P) be a (complete) 
probability space, and let E be the expectation with respect to P. 



Assumption 2.1 (Gaussian white noise). The noise £ = (£(?/), y € Y) in (1.1) is 
a stochastic process, defined on (f2, J 7 , P) with the properties that 

1. for each y (zY the random number £(?/) € Li (Q, J- , P) is a centered Gaussian 
random variable, and 

2. for all y,y' €Y the covariance structure is E = (y,y r ). 

As a consequence, the mapping y — > £(y) is linear, and we shall thus write £(?/) = 
(£, y) , we refer to [5] for details. 

The related Gaussian process £ := T*£ has covariance E [(£, w) (£, w')\ = (w, Aw'}, 
w,w' € X with the operator A:=T*T. 

Assumption 2.2. The operator A has finite trace Tr [A] < oo. 



Under Assumption 2.2 Sazonov's Theorem, cf. 6], asserts that the element £ 



T*£ is a Guassian random element in X (almost surely). Therefore the equation 



= Ax^+S( (2.1) 



is a well defined linear equation in X (almost surely). This will be our main model 
from now on. 

Moreover, Assumption |2.2| implies that the following function is well defined; for 
further properties we refer to [2]. 

Definition 2.1 (effective dimension) . The function TV (A) defined as 

Af(\) = JVa(A) := Tr [(A + XI^A] , A > 0, (2.2) 

is called effective dimension of the operator A under white noise. 

Along with the effective dimension, as in [T^] we introduce the decreasing func- 
tion gj\f(i) given by 

&v(t) := l/y/W(t), t > (2.3) 

and its companion 

@ e *(t):=tQ M (t), t>0. (2.4) 
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The latter function is continuous and strictly increasing, hence its inverse is well- 
defined. 

We recall the notion of linear regularization, see e.g. Definition 2.2]. 
Definition 2.2 (linear regularization). A family of functions 

g a : (0, \\A\\] h-> R, 0<a< \\A\\, 

is called regularization if they are piecewise continuous in a and the following properties 
hold: 

1. For each < t < \\A\\ we have that |r a (i)| — > as a — > 0; 

2. There is a constant 71 such that siap <t< || ^4|| l r a(£)| < 7i for all < a < \\A\\; 

3. There is a constant^* > 1 such that sup < t <||^|| a \g a {t)\ < 7* for allO < a < 00, 
where r a (t) := 1 — tg a (t), <t < \\A\\, denotes the residual function. 

We further restrict the analysis to regularization schemes which are monotone 

r a (t) <r f3 (t), for0<a</3 (2.5) 



< r a (t) < 1, for a > 0. (2.6) 

Hence Item ^ in Definition |2.2| holds with 71 = 1, and also < tg a (t) < 1. We also 
recall the following fact from [8J Lemma 2.3]: For < a < (3 there holds 

< r p {t) - r a (t) < (1 + y.)-L- rf ,(t). (2.7) 



Indeed, it follows from (2.5) and (2.6) that 

< rp(t) - r a (t) < (1 - r a {t))r p (t) = tg a (t)r (t). 

The result now follows from the observation that (t + a)g a (t) < 1 + 7* . 

Having chosen an initial guess xq S X and a regularization we construct the 
approximate solutions 

x 5 a := x - g a (A)(Ax - z s ), and x a := x - g a (A)(Ax - z)\ 

for the noise free case we use z := Ax^ . Recall that the element Q — T*^ is a Gaussian 
random element in X (almost surely). Therefore, we will use the root mean squared 
error at a solution instance x^ , given as 

(E[||.xt-4|| 2 ]) 1/2 , a,S>0. (2.8) 

2.2. Parameter choice. For the stopping criterion we will consider the following 
setup. Having chosen a constant < q < 1 we select the parameter a from the 
geometric family 

A q := {a k , a k := q k a , k = 0, 1, 2, . . . } . (2.9) 
For the statistical RG-rule we introduce the family of functions 

*„(*) = rr-> t,a>0, (2.10) 
t + a 
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which are the residual functions from Tikhonov regularization. 

Definition 2.3 (statistical RG-rule). Given r > 1, r) > and k > 0, let o.rg be 
the largest parameter a € A g for which either 

\\s a (A)(Ax s a - z 5 )\\ < r(l + «)— ^r, (2.11) 

or 

e^(a)<ri(l + K)6. (2.12) 



We will call the criteria (2.11) and (2.12) the regular stop and emergency stop, 



respectively. Notice that the regular stop in Definition |2.3| can be viewed as the 
Raus-Gfrerer rule applied to Lavrent'iev type regularization of the symmetrized equa- 
tion ( pEij ). 

2.3. Restricting the solution set. One important observation in the subse- 
quent analysis, in particular in Section [3j will be that the RG-rule as introduced in 
S 2.2 may fail for statistical problems (and also for bounded deterministic general noise), 
if the solution element has abnormal spectral behavior relative to the operator A. 
Therefore, we shall need the following restriction for the solution . To describe this 
we use the spectral resolution (-E-t)o<i<||A|| °^ the (compact) non-negative self-adjoint 
operator A. 

Assumption 2.3. There exist c\ > 1, < c<i < 1 and < t < \\A\\ such that 



d\\Et(xt - x )\\ 2 < c\ \ T*(t) d\\Et(xt - x )f 



for all < a <U). 

The inequality in Assumption 2.3 with C2 = 1 was introduced in |15) as a general- 
ization of a restricted form on x^ — xq in |10j for the (iterated) Tikhonov regularization. 

Example 2.1. For the n-times iterated Tikhonov regularization, we have r a (t) — 
a n /(t + a) n . It is easy to see that 



|r a (t)| > c 3 



for t > C2CX, 



with C3 := (02/(1 + C2))™. Therefore, in this case, Assumption 2.3 is equivalent to 

! />00 

d\\E t (x^ - x )\\ 2 < c 4 a 2n / r 2n d\\E t (x^ ~ x )\\ 2 , 0<a<t o . 



This, with C2 = 1, is the condition used in |10) . 

Example 2.2. For truncated singular value decomposition method we have 



9a(t) 



1 7. t>a, 



and r a (t) 



0, t>a, 

1, t < a. 



0, t < a 
Thus Assumption |2.3| becomes 

d\\E t {x^ - x Q )\\ 2 <c a / d\\E t {x^ -x )|| 2 , •,()-.- n 1- /„ 
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We observe that 

j^ a d\\E t (^-x )r 

J a d\\E t (x^x W 
\\E C2a (x^ - x )\\ 2 
\\E a (xl-x W ' 

Therefore, for this scheme, Assumption |2.3| is equivalent to the existence of constants 
0<c 2 <l, < < 1 and < t < \\A\\ such that 

\\E C2a (x^ - x )\\ < e\\E a (x^ - x )\\, V0<a<t o . (2.13) 



f° a d\\E t (x1- XQ )\\* 
f a d\\E t ( X l-x W 



Example 2.3. For the asymptotical regularization we have r a (t ) = e t l a . Sin ce 



2.3 



holds if (2.131 is 



e -t/a > e -l f or C2Q , < £ < q:, it is easy to see that Assumption 
satisfied. 

Example 2.4. For the Landweber iteration with ||^4|| = 1, we have r a (t) = (1 — 
i)! 1 /"!, where [1/a] denotes the largest integer that is not greater than l/a. Observing 
that for < t < a < 1/2 there holds (1 - t)^/^ > (1 - i) 1/a > (1 - > 1/4. 



Therefore, Assumption 2.3 holds if (2.13) is satisfied 



2.4. Main result and discussion. The main result in this study is as follows. 

Theorem 2.1. Let assump tions\2. 1\\2.S\ hold. Let olrg be chosen according to the 
statistical RG-rule with k = ^/8| log(l/<5)|/./V(ao)- Then there is a constant C such 
that 



(IE [|| 



1) 



1/2 



< C inf 

0<a<Q 



a(W|log(l/tf)|) 



The oracle inequality as established in Theorem |2 . 1 1 allows to state the error bound 
which is obtained under known general source condition and by an a priori parameter 
choice. We recall some notions. 

Definition 2.4 (general source set). Given an index junction ip that is continuous, 
non-negative, and non- decreasing on [0, \\A\\] with "0(0) = 0, the set 

:— {x £ X : x — ip{A)v for some \\v\\ < 1} , 

is called a general source set. 

For solutions x' which belong to some source set, the bias \\x a — || can be bounded 
under the assumption that the chosen regularization has enough qualification, see e.g. 

® 

Definition 2.5 (qualification). The regularization is said to have qualification ip> 
if there is a constant 7 < 00 such that 

\r a {t)\^{t)<^{a), a>0. 



Notice that x* — x a = r a (A)(x 1i — xq). If the regularization has qualification ip and 
- xq £ , then 

\\x* - x a \\ < jip(a)\\v\\ < jip(a). 
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By choosing as > to be the root of the equation 

Q eui ,(a) := Q eM (t)m = S (l + V|log(l/(5)|) , 

we can use the the oracle inequality in Theorem |2.1| to obtain the following result. 

COROLLARY 2.1. Let the assumption s\2. l\\2.S\hold, and let ana be chosen accord- 
ing to the statistical RG-rule with k = y8| log(l /5)\/Af(a ). If the regularization has 
qualification tp then 



sup (E [|| 



rt 



2 ]) 1/2 ^^(e- 1 



5(1 + vlMiM) 



Thus, up to a logarithmic factor, the rate in Corollary |2.1| coincides with the one 
from [SJ Theorem 1], which is known to be order optimal in many cases. 

We conclude this section with an outline of the proof of Theorem |2.1| The basic 
idea is to reduce the argument to the one for bounded deterministic noise. The bound 
in Theorem |2.1| uses the effective dimension Af, or more precisely the function gjs/. 
This function naturally appears when considering the average performance of the noise 

1 /2 

under the weight s a (A) because 

1/2 



E 



,1/2 



(Tr [s a (A)A}) 1/2 = ^aW{c 



1 



Qn(&) 



a > 0. 



(2.14) 



Therefore, we choose a tuning parameter n, as specified in Theorem |2.1| and define the 
set 



C: Il4 /2 WCII <(! + «) 



where a is the largest number in A q satisfying 



1 



a < a £ A, 



(2.15) 



Q M (a)< V (l + K)S. 



Let Z C K denote the complement of Z K in X. 
Cauchy-Schwarz inequality to derive that 



Since X = Z K (J Z^ , we can use the 



(* [II 



,t 



\) < sup 



,t 



;il + (E[||xt_4||*]) 



1/4 



1/4 



(2.16) 



see O Proposition 3]. We will estimate the two terms on the right side of (2.161 with 
a = cirq. Uniformly for £ £ Z K the first term on the right can be considered as 
error estimate under bounded deterministic noise; and we will show in Section [3] that 
it can be bounded by the right hand side of the oracle inequality in Theorem |2. 1| This 
analysis may be of independent interest. In Section [4] we will use some concentration 
inequality for Gaussian elements in Hilbert space to show that the second term on the 



right in (2.16) is negligible; this is enough for us to complete the proof of Theorem 2.1 



3. Auxiliary results for bounded noise. The situation for bounded determin- 
istic noise which resembles the Gaussian white noise case is regularization under some 
specifically chosen weighted noise. We recall the function s a from (2.10). As could be 



seen from the set Z K in (2.15) the approriate setup will be as follows. 

Assumption 3.1. There is a function a — > 6(a) > defined on (0, oo) that is 
non- decreasing, while a — > 5(a) /s/a is non-increasing such that the noise £ obeys 



a < a £ A 



(3.1) 
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where a € A q is the largest parameter such that a < rj5(a) with r\ > being a given 
small number. 

Because a 5(a) j\fa is non-increasing and a —¥ \[a is strictly increasing, it is 
easy to see that a is well-defined. 

Remark 3.1. The setup in Assumption |3.1| on noise covers a variety of cases which 
have been subsumed under the notion of general noise assumptions, we refer to |14l [T] . 
Specifically, let us consider the following situation. Suppose that the noise ( allows for 
a noise bound for some parameter fj, with 

WA'^CW < I- (3-2) 

In this case we can bound 

S\\s 1 J\A)a<S\\s^(A)A^\\\\A-^\\ < \\s^(A)A^5. 

1/2 

It is easily verified that the operator norms \\s a [A)A^\\ are uniformly bounded for 
a > if and only if < fi < 1/2. In this range we easily obtain that 

\\s 1 J 2 (A)A^\\<a^, a>0. 

The two limiting cases are fi = 0, where we assume ||£|| = < 1 which corresponds 

to large noise, and /i = 1/2, where we assume ||A _1 / 2 (^|| = ||£||< 1 which corresponds 
to the usual noise assumption in linear inverse problems in Hilbert spaces. In any 
of the cases 0</i<l/2we get a bounding function 5(a) — 5a^ , which obeys the 
requirements made in Assumption |3.1| 



Let a € A p be defined as in Assumption 3.1 i.e. a G A g is the largest parameter 
such that a < r)5(a). 

Definition 3.1 (RG-rule). Given r > 1 and rj > 0, we define a* € A q to be the 
largest parameter such that 

a, > a and \\s ar (A)(Ax 5 at - z s )\\ < ^(a*); (3.3) 

if such a* does not exist, we define a* := a. 

We notice that the norm in the above criterion can be rewritten as 

\\s a (A)(Ax 5 a - z s )\\ = \\s a (A)r a (A)(Ax - z 5 )\\. 

3.1. Properties of the RG-rule. We give some technical consequences of the 
stopping criterion which will be used later. 

Lemma 3.1. Let a € A q be any parameter such that a > a*. Then there holds 

5(a) 1 +,, 

— < r \\x a -X* . 

a t — 1 

Proof. Since a > a* , by the definition of a* we must have 

r5(a)< \\s a (A)r a (A)(Ax -z 5 )\\. 

Therefore, it follows from Assumption |3. 1| that 

r5(a) < \\s a (A)r a (A)(z - z 5 )\\ + \\s a (A)r a (A)(Ax - z)\\ 
< \\sl /2 (A)r a (A)\\5(a) + \\s a (A)A\\ \\x a - x% 
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Since < sl /2 (t)r a (t) < 1 and < s a (t)t < a, we have ||sy 2 (A)r Q (^)|| < 1 and 
||s a (yl)A|| < a. Consequently 

(r — 1)6 (a) < a\\x a — x'\\ , 

which gives the estimate. □ 

Lemma 3.2. Let the parameter a* be chosen by the RG-rule in Definition \3.1\ 
Then 

\\s a ,(A)r a ,{A){Ax - z)\\ < 7 <5(a*), 

where jq := max{l + r, — £o||}- 

Proof. If a* = a, then it follows from the definition of a that a* < ?7<5(a:*). 
Consequently 

IK. (A)r a » (A)(Ax -z)\\ = |K. (A)r a , {A)A(x^ - x )\\ 

< a*\\x* — xq\\ < r/Wx^ — x \\6(a t ). 

Otherwise we have that a* > a. Then by the definition of a* we have 

\\s a ,(A)r a ,(A)(Ax - z)\\ 

< IK, (A)r am (A)(z - z 5 )\\ + |K. {A)r a , (A)(Ax - z s )\\ 

< K{ 2 (A)r a ,(A)\\5(a*)+T6(a*) < (1 + T )6(a.), 

and the proof is complete. □ 



3.2. Auxiliary inequalities: The impact of Assumption [273} The following 
inequalities may be of general interest. The first one goes back to pj |9], see also 
Lemma 2.4]. 

Lemma 3.3. For < a < (3 we have 

\\x p -x a \\ < 1 -±^\\A^s 1 l\A)r p {A)^ -x )\\. 

Proof. We first notice that xp — x a = (rp(A) — r a (A))(x' — xq)- The bound 
established in (2.7| yields that 

IN/3 - a: a || = HMA) - r a (A))(xi - x )\\ 

< (l + 7 *)P(a + A)- 1 r /3 (A)( a; t- a;o )|| 

= ^* \\Aa a (A)rp{A)(x^ - x )\\. 
a 

We may write 

As a (A) = A^ s ^{A)4- 2 {A)s^{A)A^ s f(A). 

V 

Observing that < s Q (t)i 1 / 2 < y/a and s a (t) < sp(t) for t > 0, we have that 
\\s 1 J 2 {A)A 1 ' 2 \\ < yfa and || ^(^)si /2 (^)|| < 1. Therefore 

\\As a (A)r p (A)(^ - x )\\ < \/a\\A 1/2 s^ 2 (A)rp(A)(x^ - x )\\ , 
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which allows to complete the proof. □ 

The bound from Lemma 3.3 does not suffice, and we need the following strength- 
ening, where Assumption 1 2 . 3 j iscrucial. 

Lemma 3.4. Suppose that Assumption \2.3\ holds true. Then there is a constant 
C < oo such that for < a < a there holds 

\\A l l 2 s)l 2 {A)r a {A){xt - Xo )\\ < ^L\\s a (A)r a (A)A(xt-x )\\. 

Proof. We use spectral calculus to write 

WA^s^^r^A)^ - x )\\ 2 = h(a) + I 2 {a), 

where 

ts a (t)r 2 a (t) d\\E t ( X t - x )\\ 2 



h(a) 



ts a (t)r 2 a (t) d\\Et(xt -x )f 



We first bound Ii. For t > a we have that a(t + a) < 2at, thus 1 < ^ts a (t), yielding 

9 f°° 9 
h(a) < - / t 2 s 2 a (t)r 2 a (t) d\\E t (x* - x )\\ 2 < -\\s a (A)r a (A)A(x^ - x )\\ 2 . 



To estimate h(a) we will use Assumption 2.3 We will consider two cases: < a < to 
and to < a < cto- 

When < a < to, we use Assumption 2.3 to obtain from ts a (t) < a that 

/>a />oo 

h(a) < a / d\\E t (J - xo)\\ 2 < c\a / r 2 a (t) d\\E t (x^ - x )\\ 2 . 



Since t/(t + a) > C2/(l + C2) for t > c 2 a, we further obtain 



h(a) < 



c?(l + c 2 ) 5 



(t + a) 



; r 2 a (t) d\\E t (x* - x )\f 



s%(t)rl(t)t 2 d\\Et(xi - x )\\ 



< 



c2l{1 t C2)2 \\sa(A)r a (A)A(x^ - x )\f 



Now we consider the case to < a < ao- We write h(a) — Ii(a) + l\ z> {a), where 



4V) 



ts a {t)r 2 a (t)d\\E t (x* - x )\\ 2 , 
ts a (t)r 2 a {t)d\\E t ( X l -x )\\ 2 . 



We can bound, by using Assumption 2.3 the term /{^(a) as 

/>oo 

^ 1} (a)<c?a / r 2 (t)d\\E t (x^-x )\ 



C2*0 
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Since to < a implies r to (t) < r a (t), we have 



If\a) < c\a / r^tJdU^Cart- a:o)|| 



C2*0 



Observing that for t > c 2 in there holds tt— > tt — > * 2 \° — , we further obtain 

o — ^ u t+a — t+ag — c 2 tg+a ' 



C2*0 + «0 \ 



C2*0 



2„2 ,00 fa 2 



a Jc 2 t (t + a) 



-r 2 a (t)d\\E t (x^ - x )\\ 



cf / c 2 t + a 
a \ c 2 t 

,2 



s 2 a (t)r 2 a (t)t 2 d\\E t (x^ -x Q ) 



C2*0 



< 



+ \\ Sa (A)r a (A)A^- X0 )t 
a \ c 2 t J 



To bound If', we observe that for t < t < a there holds 1 < 9 ^rts a {t). 
Consequently 



r (2)/ n ^ "0 + t f" 

I{ '(a) < 



< 



t a 
ap + tp 
t Q a 



t 2 s 2 a (t)r 2 a (t)d\\E t (x^-x )\\ 
s a {A)r a (A)A(x^ -x )\\ 2 . 



la 



Combining the above estimates we therefore obtain the desired bound with C 

1/2 

□ 



( 9 , Ci(l+c 2 ) 2 , ag+tg , 2 / cato+ao A 2 ^ 



We summarize the results from Lemma 13.31 and Lemma 13.41 as follows 



Corollary 3.1. Let Assumption 2.3 hold. Then there is a constant C < 00 such 
that for all < a < ft < ao there holds 

- sail < -^=\\s f3 {A)rp{A)A{x^ - x )\\. 

3.3. Deterministic oracle inequality. In this section we state the main auxil- 
iary result for bounded deterministic noise, as this seems to be of independent interest. 

Theorem 3.1. Let the assumptions 2.3 and 3.1 hold, and let the parameter a* 
be chosen by the RG-rule starting with otp. Then there holds the oracle inequality, i.e. 
there is a constant C such that 



-1+ 1| < C inf 



0<a<ao 



■ +,, 8(a) 
\x a -x*\\ + — 
a 



(3.4) 



Proof. We first derive s ome preparatory results. Observing that x^ — x a — 
r a (A)(x' — xq), we have from (2.5) that 

||zt -x a \\ < ||a;t - X p\\, V0<a</3. (3.5) 

By the conditions on g a we have 

9*(t) 1 / 777 / 7777 —X . +7*) 

1/2/ ; = -^Vg a (t)V9 a (t)(a + t) < . 
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Therefore, with = \/7*(5 + 7* ) we have 



1^-411 ^ \\^-xo 

< II ac"*" — x n 



1 

J72 



(A) \s^(A)(z-z s )] || 



It then follows from Assumption |3.1| that 



x ] - • : a: 1 - x 



(3.6) 



(3.7) 



Next we will prove the oracle inequality in two steps. We first restrict the oracle 
bound to a £ A q , and we show that 



xi -x T || < C inf 



til ^( a ) 



(3.8) 



In this case we shall distinguish the cases a > a* and a < a*, respectively. 

f a -» 

8(a. M /q) 



Case a > a* We first have from (3.7), (3.5) and the monotonicity of a —> 8(a) that 



1 4. _ X II ^ ll X a. _ 1 I 



< i a — a; 1 



Since a, a* € A q , we have a*/g € A q and a > a*/<7 > a*. Then we can 
conclude, by using Lemma 3.1 and (|3.5|), that 



z T || < \\x a - x T || + 



g(r - 1) 1 



x T ll < i 



q(r-l) 



\\x a - x^\\. 



Case a < a* We actually use Assumption 2.3 and its consequences. Based on Corol- 
lary |3.1| and Lemmas |3.2| we conclude in this case that there is a constant 
C < oo with 



l | X Q, ^ X l l l l X Q/ 3j 



\s a ,(A)r a ,(A)(Ax - z)\\ 



\ \ X fy X 



c 7o 



8(a*) 



Consequently, we deduce, using the bound (3.7) and that a — > 8(a) /^/a is 
non-increasing, that 

<5(a*) 



avll < \\x n 



,tl 



C* ^ ll^c 



,tl 



5(a*) <5(a*) 

C-70 / + c * 

Jaa* a* 



<(C 7o + c*)(j|z a -zt|| + ^). 

Finally, we show the oracle inequality in its full generality. To this end, let < 
a < «o be any number. Then there is j £ N such that ctj < a < aj/q. By using (3.5 1, 
the fact that a —> 5(a) is increasing, and the fact that a —> 8(a) /a is decreasing, we 
obtain 



\\x a -xH + 6 M 



> \\x n , - x 



atj/q 



> 



1 y\ x aj 



ft I 



8(a ) 



>q inf f\\ Xfl - x 1\\ + M\ 



P J 
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Since < a < ao is arbitrary, we obtain 
inf f||x a -*t|| + M 

The proof is therefore complete. □ 



> q inf 

a6A„ 



5(a) 



3.4. Discussion, (a) From Lemma 3.1 and (3.5) it follows that 
5(a*/q) 



a*/q 



< 



_ ^ ll-^a./g x II 



< 



\X0 



Since a — > 5(a) is non-decreasing, we obtain 



< 



1 



If in the definition of a we take < 77 < g iiJ" _ g t|i > then we always have a* > a. 
Therefore, the RG rule in Definition |3.1| simply reduces to the form: a* is the largest 
parameter in A q such that 

\\s ar (A)(Ax 5 at -z s )\\<T5(a*). 



The oracle inequality in Theorem 3.1 still holds for this simplified parameter choice 
rule. 



(b) The oracle inequality established in Theorem 3.1 can be used to yield error 
bounds when the solution x ' has smoothness given in terms of gene ral so urce conditions, 
i.e., if — xq belongs to some source set introduced in Definition "2A_To see this, we 



assume that the regularization has qualification ip as in Definition 2.5 and x> — xq G 



H^. We also assume, as introduced in Remark 3.1 that the noise can be bounded as 



1 1 ^4 ^Cll < 1, which results in 5(a) = 5a* 1 for < fi < 1/2. Then, for the parameter 
a*, determined by the RG rule in Definition |3.1[ it follows from Theorem |3 . 1 1 that 



< C inf 

0<a<ao 



■0(a) 



Associated to the smoothness ip, let 6 Atj ^,(t) := t > 0, which is a strictly 

increasing function. Given i5 > we assign as > such that @fi^(as) = 5. Then we 
can conclude that 



\\x s am -x1\\<CU(as) + 



< 



2CrP(e-^(6)), 



which was shown to be order optimal for x^ with the above smoothness in [TU Theorem 
4]. Thus, the present results cover part of the analysis carried out in [2]; it extends the 
stopping criteria studied there to the RG-rule, and hence this relates to [13J. However, 
the above approach is limited. First, the case of small noise, i.e., when —1/2 < fi < 
cannot be covered. Secondly, t he o racle inequality is seen to hold only for those 
solutions x' satisfying Assumption 2.3 
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4. Proof of the main result. The proof of Theorem 2.1 will be carried out in 



several steps, similar to the one in the recent studies [2[T2]. Our starting point is the 
inequality (2.16). Recall that Z K is the set defined by (2.15), i.e. Z K C X consists of 



those realizations of the noise C obeying Assumption |3 . 1 1 along the sequence ao, • • • , a 
with 



6(a) := (1 + k) 



Qm(<*) 



a > 0, 



where a is the largest number in A q satisfying 

& m (a)<r)(l + K)5, 
According to the definition of o:rq we have «_rg > 



(4.1) 



(4.2) 



In order to estimate the first term on the right of (2.16) with a :— a^Q, we observe 



that when ( e 2 K . the parameter a* determined by the RG rule in Definition |3 . 1 1 with 



6(a) given by (4.1) is equal to the parameter ana determined by the statistical RG 



rule in Definition 2.3 Therefore we may use Theorem 3.1 to conclude 



sup 

Cez K 



< R J\<C in| 



(! + «)<* 



(4.3) 



In the following we will estimate the second term on the right side of (2.16) with 
a = a RG- We need some auxiliary results. 



Lemma 4.1. Let Assumptions 2.2 hold. Let a = aoq n € A q be the largest param- 



eter satisfying \\.2). Then there is a constant C such that 

n<C(l + \log(l/6)\). 



Proof. Since \\A\\ > is the first eigenvalue of A, it follows from the definition of 
Af(a) that 



Af(a) = Tr [(al + A^A] > 



\A\\ 



> 



mil «o 

Therefore, with C := ^(a + p]j)7|pJJ, we obtain 



l^ll 



< a < ao- 



Q 8M (a) = Jjfj^j < C a 1/2 , 0<a<a„. 

According to the definition of a we have 

® eM (a/q) >t 1 (\ + k)6>t 1 6. 

Consequently Cq(c\/ q) 1 / 2 > n6 which implies the result. □ 

We shall also use some prerequisites from Gaussian random elements in Banach 
spaces, and we recall the following results from [TTJ Lemma 3.1 & Corollary 3.. 2]. 

Lemma 4.2. Let S be any Gaussian element in some Banach space. Then 



P [||S|| >E[||H||]+6]<e-^ > 
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with v 2 :— supj| UI jj <1 E [(s, w)] . Moreover, for each p > 1 there is a constant C p such 
that 



E[||H||Y /p <CyE[||3||]. 



We apply Lemma 4.2 to S := slf 2 (A)^s 1 J 2 (A)T* £. For fixed a G A q we denote 
:= (C: K /2 (A)C\\ <(l + K )^- 



Corollary 4.1. For eac/i < a < a tftene feoZds 



and 



E 



1/4 



< c 4 



av(<*)' 



Proof. We first estimate P[Z° a ], The expected norm of 5 can be bounded, 
cf. K 141), as 



E 



l| s y 2 (A)cn] < (e [n s y 2 (A)cii 



1/2 



1 



fi/v(a) 



(4.4) 



For any w G X with ||n;|| < 1, the weak second moments can be bounded from above 
by 

E[(E,w)} 2 = E [(e,T S i/2 (A)w) j 2 = ||T S V 2 (AH| 2 < \\Ts l J 2 {A)f < a. 



Thus we may apply Lemma 4.2 with b := n/ Qj\f{a) to conclude that 



^(°) 



which completes the proof of the first assertion. The second one is a consequence 
of (4.4) and Lemma [4~2| □ 

Finally we turn to the proof of the main result. 

Proof of Theorem 2.1 We will use (2.16) with a — oirq. The first term on the 
right has been estimated in (4.3). By using Lemma 4.1 and Corollary 4.1 we obtain 
from Z% = \J&<aeA a Z n,a that 



¥[Z c K ]<(n + l) sup ¥[Zl a ] <C(l + |log(l/<5)|)e r 

o<a£A, 

For k = y/8\log{l/6)\/Af(a ) this yields 



' \Zl\ < C (1 + | log(l/5)|) <5 4 < C 1 + vlWl <5 4 - 



(4.5) 

It remains to establish a bound for E [\\x^> — a;a HG || 4 ] • We emphasize that the ran- 
dom element x s aRG is no longer Gaussian in general, since the parameter ana depends 
on the data £. Hence we cannot apply Lemma |4 . 2 1 directly. Therefore we will use the 



error bound (3.6) which is valid for every £. By using the facts that «_rg > ot and that 

1/2 

the function a i-> sj (tj /a is decreasing for each t > 0, we obtain 



rt 



CJ<II* 



■ c*o < a; 1 — xo\\ + c*o — 
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Since a is deterministic, the element s a (^)C i s Gaussian. Thus we may use the bound 

1 /2 

on the fourth moment of ||s 5 (^4)CII given in Corollary 



4.1 



(E [\\x-x s aRa p]) 1/4 <\\J-x \\+c*C 4 



to obtain 
S 



Since the function a — > Qjsf(a) is decreasing, it is easy to obtain that Q ejv (a) > 
qQ eM {a/q). By the definition of a we then obtain Q eu (a) > qrj{l + k)5 > qrjS. 
Consequently 



(e [\\x-x s aRG f}) 1/4 <\\^-x \\ + 



C* C4 

in 



(4.6) 



Combining the estimates (4.3), (4.5) and (4.6) with (2.16) and we use that < a < ctQ 
yields O eAf (ao)/O ejV (a) > 1. We can conclude that 

(E [11*' " <,ll 2 ]) 1/2 < C M {\\x a - xt|| + |i±^ 



The proof is therefore complete. □ 
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